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The  principle  of  minimum  cross-entropy  (minimum  directed  divergence)  is  summarized,  dis- 
cussed,  and  applied  to  the  classical  problem  of  estimating  power  spectra  given  samples  of  the 
autocorrelation  function.  This  new  approach  reduces  to  maximum  entropy  spectral  analysis 
(MESA)  in  certain  special  casts,  and  thereby  provides  a fundamental  derivation  of  MESA.  In 
contrast  to  MESA,  the  minimum  cross-entropy  approach  makes  use  of  prior  information  about 
the  power  spectrum.  Depending  on  the  extent  of  prior  information,  various  alternative  minimum 
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20.  Abstract  (Continued) 


cross-entropy  spectral  estimates  are  obtained.  When  a prior  estimate  of  the  power  spectrum  is 
available,  the  minimum  cross-entropy  result  differs  from  the  MESA  result.  Results  are  derived 
in  two  equivalent  ways;  once  by  minimizing  the  cross-entropy  of  underlying  probability  densities, 
and  once  by  arguments  concerning  the  cross-entropy  between  the  input  and  output  of  linear 
filters. 
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I.  INTRODUCTION 


Work  reported  in  / 1 / has  shown  that  the  principle  of  minimum  cross-entropy 

(minimum  directed  divergence)  provides  a correct,  general  method  of  inductive 

inference  in  terms  of  continuous  probability  densities  when  given  a prior 

density  and  information  about  the  "true"  density  in  the  form  of  expected 

values.  In  this  paper,  I show  how  cross-entropy  minimization  can  be  used  to 

estimate  power  spectra  when  given  a prior  estimate  of  the  spectrum  and  new 

information  in  the  form  of  autocorrelation  function  samples.  This  new 

approach  reduces  to  maximum  entropy  spectral  analysis  /2/  in  certain  special 

cases,  and  can  be  thought  of  as  providing  a fundamental  derivation  of  the 

maximum  entropy  technique. 

A . Maximum  Entrop y Spectral  Anal ysis  ( MESA) 

Because  the  power  spectrum  S(f)  of  a band-limited,  stationary  process  is 

related  to  its  autocorrelation  function  R(t)  by  a Fourier  trar  orm,  and 

because  it  is  relatively  easy  to  measure  the  autocorrelation  function,  many 

spectral  analysis  techniques  start  with  samples  of  the  autocorrelation 

function.  The  classical  approach  uses  spectral  window  functions  / 3/ . In  this 

approach  one  takes  the  Fourier  transform  of  the  product  R(t)W(t),  where  R(t) 

is  the  measured  autocorrelation  function  in  the  range  |tj<  T,  and  where  W(t) 

is  a known  window  function  with  W(t)  " 0 for  |t|  > T.  One  then  estimates  the 

unknown  power  spectrum  S(f)  by  exploiting  the  convolution  theorem,  which 

states  that  the  Fourier  transform  of  the  product  of  two  time  domain  functions 

is  equal  to  the  convolution  in  the  frequency  domain  of  their  Fourier 

transforms.  Although  mathematically  elegant,  the  classical  procedure  can  be 
Not*:  Manuscript  submitted  November  29,  1978. 
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seen  to  distort  the  known  values  of  R(t),  |t|  < T,  and  to  assume  that  R(t)  ■ 0 
in  the  unknown  region  (t|  >T,  despite  the  fact  that  R(t)  cannot  in  general  be 
zero  everywhere  in  this  region.  An  alternative  approach  is  to  extend  R(t)  so 
as  to  take  on  reasonable  values  in  the  unknown  region  |t|>  T and  to  estimate 
S(f)  by  taking  the  Fourier  transform  of  the  resulting  extended  function.  As  a 
general  approach  this  seems  more  reasonable  than  the  classical  approach,  but 
it  leaves  open  the  question  of  how  to  extend  the  measured  portion  of  R(t). 

In  proposing  the  technique  called  Maximum  Entropy  Spectral  Analysis 
(MESA),  Burg  HI  suggested  that  R(t)  be  extended  in  a manner  that  maximizes 
the  entropy  of  the  underlying  stationary  process.  Specifically,  Burg  proposed 


that  the  power  spectrum  S(f)  be  estimated  by  maximizing 

'•W 

| df  log(S(f )) 

0 

subject  to  the  known  constraints 

,W 

R(tfc)  ■ Idf  S(f)  exp(2ffitjtf ) , 


(1) 


(2) 


where  W is  the  bandwidth,  and  where  RCt^),  k“l,2,...,m,  are  known  samples  of 
the  autocorrelation  function. 

Maximum  entropy  spectral  analysis  can  be  seen  /4/  as  an  application  of 
Jaynes's  maximum  entropy  principle  / 5/,  which  applies  to  situations  in  which 
one  wishes  to  estimate  or  guess  at  an  unknown  probabilities  q^(x^)  when 
given  a set  of  expected  values  g^ 

Sy  ■ aE.q^x.^Cxj)  > (3) 

k - 1,2, ...,m.  The  maximum  entropy  principle  states  that,  of  all  the 
distributions  that  satisfy  the  constraints  (3),  one  should  choose  the  one  with 


-fc- 


it 


the  largest  entropy 


H(q)  = - q(x . ) log(q(x . ) ) . (4) 

Intuitively,  the  maximum  entropy  principle  follows  from  the  the  fact  that,  to 
within  the  choice  of  logarithmic  base,  entropy  (4)  is  a unique  measure  of  the 
uncertainty  represented  by  the  distribution  q(x.)  /6/,/7/.  Jaynes  argues 
that  the  maximum  entropy  distribution  is  "the  only  unbiased  assignment  we  can 
make;  to  use  any  other  would  amount  to  arbitrary  assumption  of  information 
which  by  hypothesis  we  do  not  have"  /5,  p.  623/.  Similarly,  the  maximum 
entropy  distribution  "agrees  with  what  is  known,  but  expresses  'maximum 
uncertainty*  with  respect  to  alt  other  matters"  /8,  p.231/. 

The  maximum  entropy  principle  is  applied  somewhat  indirectly  in  MESA.  The 
expression  (1)  is  the  entropy  gain  in  a stochastic  process  that  is  passed 
through  a linear  filter  with  characteristic  function  Y(f),  where 
S(f)  = | Y( f )| 2 (see  /6,  pp.  93-95/,  /14,  pp. 412-414/,  /26,  p.  243/).  If  the 
input  process  is  white  noise,  then  the  output  process  has  spectral  power 
density  S(f).  This  suggests  that  the  process  entropy  can  be  maximized  by 
maximizing  the  entropy  gain  of  the  filter  that  produces  the  process.  Thus, 

(1)  is  maximized  subject  to  the  constraints  (2). 

B.  Limited  Acceptance  of  MESA  Viewpoint 

Burg's  proposal  /2 / led  to  a variety  of  practical  and  useful  spectral 
estimation  algorithms  /9/-/18/,  but  it  seems  fair  to  aay  that,  despite  its 
strong  intuitive  appeal,  MESA  has  not  had  widespread  acceptance.  The  reasons 
for  this  appear  to  go  beyond  the  natural  inertia  that  results  from  familiarity 
with  the  long-standing,  classical  approach,  particularly  since  MESA  is  known 
to  be  equivalent  to  minimum  least-squares  estimation  / 1 0/ , / 1 9/ . 


3 


I believe  that  much  of  the  resistance  to  the  MESA  viewpoint  stems  from 
doubt  about  the  validity  of  the  maximum  entropy  principle,  which  has  remained 
controversial  /20/-/25/  despite  numerous  successful  applications  (see  /!/). 

To  some,  entropy's  properties  as  an  information  measure  make  it  obvious  that 
entropy  maximization  is  the  correct  way  to  account  for  constraint 
information.  To  others,  such  an  informal  and  intuitive  justification  yields 

plausibility  for  the  maximum  entropy  principle,  but  not  proof  why  maximize 

entropy,  why  not  some  other  function?  Moreover,  even  if  one  accepts  the 
maximum  entropy  principle,  there  are  well-known  problems  / 7/  with  extending  it 
from  (3)-(4)  to  the  continuous  case.  Such  an  extension  is  required,  since 
derivations  of  (1)  deal  with  continuous  probability  densities  ( / 6 , pp.  93-95/, 
/ 14,  pp.  412-14/,  / 26,  p.  243/).  Some  of  the  resistance  to  MESA  may  also  stem 
from  the  fact  that  the  maximum  entropy  principle  is  applied  indirectly  in 
terms  of  filtering  rather  than  directly  in  terms  of  underlying  probability 
dens i ties. 

All  of  these  hesitations  can  be  addressed  in  light  of  the  results  for 
cross-entropy  minimization  that  were  obtained  in  / 1 / . 

C . Out! ine 

Section  II  summarizes  the  principle  of  minimum  cross-entropy  and  discusses 
informally  the  sense  in  which  this  principle  provides  a correct,  general 
method  of  inductive  inference  /!/.  In  Section  III,  I describe  stochastic 
signals  in  terms  of  frequency  domain  probability  densities,  I derive  the 
minimum  cross-entropy  density  given  known  expected  spectral  powers,  and  I 
discuss  two  different  possible  densities  for  white  noise  — one  uniform 
probability  density  and  one  non-uniform  density.  In  Section  IV,  I derive  the 
cross-entropy  between  the  input  and  output  of  a linear  filter  and  show  that 
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the  resulting  expression  reduces  to  (1)  when  the  input  is  one  of  the  white 
noise  densities  introduced  in  Section  III  but  not  when  the  input  is  the  other 
one.  Section  V contains  derivations  of  minimum  cross-entropy  densities  and 
corresponding  power  spectrum  estimates  when  given  information  in  the  form  of 
autocorrelation  samples  for  cases  both  with  and  without  previous  estimates  of 

the  power  spectrum.  The  derivations  are  carried  out  twice  once  directly 

in  terms  of  the  underlying  probability  densities,  and  once  indirectly  in  terms 
of  linear  filters.  The  results  are  compared  with  those  of  MESA  in  Section 
VI.  Some  remarks  about  possible  algorithms  (Section  VII)  are  then  followed  by 
brief  conclusions  (Section  VIII). 


II.  CROSS  ENTROPY  MINIMIZATION 


A.  A General  Inference  Problem  Involving  Probability  Densities 

Let  x denote  a single  state  of  some  system  that  has  a set  D of  possible 
system  states  and  a probability  density  q^(x)  of  states.  Let  El  be  the  set  of 
all  probability  densities  q on  D such  that  q(x)^  0 for  x 6J)  and 


dx  q(x)  = 1 


(5) 


We  assume  that  the  existence  of  qG.  ^ is  known  but  that  q^"  itself  is  unknown. 
The  density  q*  is  sometimes  known  as  a "true"  density. 

Suppose  p6  El  is  a prior  density  that  is  our  current  estimate  of  q^ , and 
suppose  we  gain  new  information  about  q^  in  the  form  of  a set  of  expected 
values 


jdx  qt(x)gr(2)  s - gr  , 


(6) 


I 


for  a known  set  of  bounded  functions  g^Cx)  and  numbers  g^ , r = 

Now,  because  the  constraints  (6)  do  not  determine  completely,  they  are 
satisfied  not  only  by  but  by  some  subset  of  densities  4 S El . Which  single 
density  should  we  choose  from  this  subset  to  be  our  new  estimate  of  q* , and 
how  should  we  use  the  prior  p and  the  new  information  (6)  in  making  this 
choice? 

B.  The  Principle  of  Minimum  Cross-Entropy 

The  solution  to  this  inference  problem  is  obtained  by  minimizing  a 
functional  H(q,p)  called  cross-entropy, 

H(q , p)  = (dx  q(x) log(qfx) /p(x) ) • (7) 

v 

Specifically,  of  all  the  densities  q'£$  that  satisfy  the  constraints  (6),  we 
choose  the  one  with  the  smallest  cross-entropy  H(q',p)  with  respect  to  the 
prior  p.  Stated  differently,  the  posterior  density  q satisfies 

H(q , p 1 = min  H(q',p)  , 

q'ej 

where  ^ Ei  comprises  all  of  the  densities  that  satisfy  the  constraints  (6). 

Mathematically,  the  solution  is  obtained  using  the  method  of  Lagrangian 
multipliers  and  standard  techniques  from  the  calculus  of  variations.  The 
minimization  condition  is 

log?qfx'/p(x))  + 1 + A0  + |Jrgr(x)  = 0 , (9) 

where  the  are  Lagrangian  multipliers  corresponding  to  the  constraints 
(6),  and  where  \ is  a Lagrangian  multiplier  corresponding  to  the 
normalization  constraint  (5).  The  solution  of  (9)  is 

6 
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qfx)  = p(x)exp(-  A - ^rgr  x)),  (10) 

where  'X  = Xc  + 1 . It  is  convenient  to  write  (10)  in  the  form 


q(x)  = Z !p(x)exp(-  ^Lr  ^r8r^x^  » 
where  Z is  the  ''partition  function", 

Z = exp(!X)  = ^jdx  p(x)exp(-  j?>rgr(x)) 


(11) 


(12) 


The  values  of  the  multipliers  (5^  are  determined  by  the  known  expectation 
values  in  (6).  One  can  express  the  posterior  q directly  in  terms  of  the 
values  gr  by  solving  the  equations 

,-l  > Z 


gr  = 2 


(13) 


for  the  or  by  substituting  (11)  into  the  constraint  equations  (6)  and 

solving  for  the  Such  solutions  are  often  difficult  or  impossible  to 

obtain  analytically,  but  one  can  obtain  them  computationally  in  general 
/ 1 , Appendix  B /,  / 2 7 / . 

The  principle  of  minimum  cross  entropy  also  applies  when,  in  addition  to 
equality  constraints  (6),  we  gain  new  information  about  q*  in  the  form  of  a 
bound  on  an  expected  value. 


jdx  q*(x)g(x)  2 (t)  )/  g • 


(14) 


Such  an  inequality  constraint  ia  handled  as  follows:  First  one  solves  for  the 
minimum  cross -entropy  density  given  only  the  equality  constraints  (6).  If  the 
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resulting  density  happens  to  satisfy  (14),  then  this  density  is  the  overall 
solution.  If  (14)  is  not  satisfied,  then  the  overall  solution  is  the  minimum 
cross-entropy  density  given  (6)  and  the  additional  equality  constraint  (g^  = g. 
C . Background  and  Justification  of  Cross-Entropy  Minimi zation 

Cross-entropy  goes  by  other  names,  including  expected  weight  of  evidence 
/ 2 8 , p.  72/,  directed  divergence  / 2 9 , p.  6/,  and  relative  entropy  / 20/ . The 
term  cross-entropy  is  due  to  Good  / 30 / . The  principle  of  minimum 
cross-entropy  was  first  proposed  by  Kullback  / 29,  p.37/,  who  called  it  a 
principle  of  minimum  directed  divergence  or  minimum  discrimination 
information.  It  has  been  advocated  in  various  forms  by  others  / 30/ , / 3 1 / , 

/ 32 / , including  Jaynes  / 8 / , / 3 3 / , who  showed  that  generalizing  entropy 
maximization  to  continuous  densities  leads  to  (71  with  p(x)  being  called  an 
"invariant  measure"  instead  of  a prior  density.  Since  entropy  maximization 

does  not  deal  with  prior  densities  there  being  an  implicit  assumption  of 

uniform  priors  this  just  expresses  the  fact  that  a uniform  prior  in  one 

coordinate  system  may  not  be  uniform  in  another.  Cross-entropy  minimization 
has  been  applied  primarily  to  statistics  / 29/ , /30/,  /34/,  but  also  to 
statistical  mechanics  / 35/ , chemistry  / 36/ , pattern  recognition  / 37/ , / 38/ , 
and  the  computer  storage  of  probability  distributions  / 39/. 

Like  entropy,  cross-entropy  can  be  characterized  axiomat ical ly  / 32/ . Its 
properties  are  desirable  for  an  information  measure  / 3 1 / , / 3 2 / , and  it  can  be 
argued  /40/  that  cross-entropy  measures  the  amount  of  information  necessary  to 
change  a prior  p into  the  posterior  q.  The  principle  of  cross-entropy 
minimization  then  follows  intuitively,  much  like  entropy  maximization.  In  / 1 / 
we  argued  that  such  justifications  are  weak,  not  only  because  they  rely  on 
informal,  intuitive  arguments,  but  also  because  they  are  indirect  they  are 
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based  on  a formal  description  of  what  is  required  of  an  information  measure 
rather  than  on  a formal  description  of  what  is  required  of  a method  for  taking 
new  information  into  account. 

Our  approach  in  / 1 / was  to  formalize  the  requirements  of  inductive 
inference  directly  in  terms  of  a set  of  consistency  axioms  that  make  no 
reference  to  information  measures  or  properties  of  information  measures.  All 
of  the  axioms  are  based  on  a single  fundamental  principle:  If  a problem  can  be 
solved  in  more  than  one  way,  the  results  should  be  consistent.  Informally, 
the  axioms  may  be  phrased  as  follows: 

1)  Uniqueness . The  results  of  taking  new  information  into  account  should 
be  unique. 

2)  Invariance . It  shouldn't  matter  in  which  coordinate  system  we  account 
for  new  information. 

3)  System  independence . It  shouldn't  matter  whether  we  account  for 
independent  information  about  independent  systems  separately  in  terms 
of  different  probability  densities  or  together  in  terms  of  a joint 
density. 

4)  Subset  Independence . It  shouldn't  matter  whether  we  account  for 
information  about  an  independent  subset  of  system  states  in  terms  of  a 
separate  conditional  density  or  in  terms  of  the  full  system  density. 

We  were  then  able  to  prove  / 1 / that  the  principle  of  minimum  cross-entropy 
provides  a correct,  general  method  of  inductive  inference  in  the  following 
sense:  Given  a prior  density  and  new  information  in  the  form  of  constraints 
on  expected  values,  there  is  only  one  posterior  density  satisfying  these 
constraints  that  can  be  chosen  in  a manner  that  satisfies  the  axioms;  this 
unique  posterior  can  be  obtained  by  minimizing  cross-entropy. 


III.  MINIMUM  CROSS-ENTROPY  PROBABILITY  DENSITIES  FOR  STOCHASTIC  SIGNALS 


A.  Power  Spectrum  Probability  Densities 


Consider  time-domain  signals  of  the  form 


■z 


a^cosCtt^t)  + b^sinCw^t)  , 


with  non-zero  that  need  not  be  uniformly  spaced.  These  are 
discrete-spectrum,  band-limited  signals  without  DC  components.  (The 
assumption  of  no  DC  term,  which  is  reasonable  for  man/  signal  processing 
appl ications , is  made  for  mathematical  convenience.)  The 
power  at  each  frequency  is  given  by  the  variables  x^, 

Xk  - ak  + bk  * 


If  we  consider  the  to  be  random  variables,  we  may  describe  a stochastic 
signal  in  terms  of  a joint  probability  density  q(x),  where  we  write  x for 
Xj ,Xj, . . . ,xn«  Instead  of  constantly  referring  to  q(x)  as  the  spectral 
power  probability  density  of  a stochastic  signal,  we  will  informally  refer  to 
q(x)  as  a "signal." 

B.  Minimum  Cross-Entropy  Densities  Given  Expected  Spectral  Powers 

Consider  first  the  problem  of  choosing  q(x)  when  we  know  the  total 
expected  power  per  discrete  frequency 


P = -L  fdx  ( £kxk)q(x)  . 


where  dx  * dXjdXj . . .dxn . To  apply  the  principle  of  minimum 
cross-entropy,  we  need  a prior  density  p(x)  to  represent  our  state  of 
knowledge  before  we  learn  even  (17).  Since  in  any  real  situation  there  will 
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be  a physical  limit  on  the  magitude  of  the  x^,  we  assume  that  the  domain  of 
x is  bounded.  We  may  therefore  use  a uniform  prior  density.  In  general, 
whether  or  not  it  is  valid  to  assume  a uniform  prior  density  for  continuous 
probabl i ty  densities  is  a difficult  question  / 8/.  Therefore,  although  we 
assume  a uniform  prior  p(x)  in  the  following,  we  shall  consider  the  question 
more  carefully  later  in  this  section. 

We  choose  q(x)  by  minimizing  cross-entropy  subject  to  the  constraints  (5) 
and  (16).  The  result  (see  (10))  is 


q(x) 


A exp(- 


P 


* 


where  |J  is  the  Lagrangian  multiplier  corresponding  to  (16),  and  where  the 
uniform  prior  and  the  Lagrangian  multiplier  corresponding  to  (5)  have  been 
absorbed  into  the  constant  A, 


(18) 


Provided  that  P is  much  less  than  the  maximum  value  of  the  x^,  we  may  use 
integration  limits  of  (0,*°)  in  (18);  this  leads  to  A * In  terms  of 
the  multiplier  , the  total  power  constraint  (17)  becomes 


■ i/p 

The  posterior  q(x)  is  therefore 
n 

q(x)  ■ "j"T  (1/P)  exp(-x^/P) 

k*i 

Thus,  q(x)  is  a multivariate  exponential  each  spectral  power  x^  is 
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exponentially  distributed  with  mean  P. 


Now  consider  the  more  general  problem  in  which  we  learn  the  expected 
spectral  power  at  each  frequency, 


dx  x^q(x) 


(20) 


Again  using  a uniform  prior,  the  minimum  cross-entropy  posterior  in  this 


case  is 


q(x)  = | I (1/Pk)  exp(-x^/P^) 


(21) 


= I I (1/ 

k-i 

(the  derivation  is  similar  to  that  of  (19)).  In  fact,  the  same  posterior  (21) 
results  if  (19)  is  used  as  a prior  instead  of  a uniform  prior. 

We  now  return  to  the  question  of  the  uniform  prior.  One  might  wonder  how 
differently  (21)  might  have  turned  out  had  we  described  the  signal  originally 
as  a probability  density  in  the  variables  a^,  (see  (15))  and  used  a 
prior  that  was  uniform  in  a^,  b^  rather  than  uniform  in  the  variables  x^ 

(see  (16)).  In  this  case  the  constraints  (20)  take  on  the  form 


pu  ‘ * bk 


2> 


idadb  (a^  + b^)q(a,b) 


(22) 


With  a uniform  prior  p(a,b),  the  result  of  cross-entropy  minimization  is 

(-  Z (K(‘l  * ‘2 


q(a,b)  =*  A exp(-  |ik(ak  + b^))  . 

Solving  for  A,  using  (5),  and  for  the  multipliers  |?>k,  using  (22),  yields 


q(a,b)  - T d/irPk)  exp(-(ak  + bk)/Pk) 
IM 


(23) 


Thus  the  variables  ak  and  bk  have  Gaussian  distributions  with  zero  means 


and  variances  Pk/2.  Since  the  variances  correspond  to  power  expectations 
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■i 


■ ■■ 


2 2 

ak  or  , this  just  shows  that  the  expected  power  is 

divided  evenly  between  the  two  quadrature  components. 

Our  next  step  is  to  transform  (23)  to  a density  in  terms  of  of  the 
variables  x^  and  to  compare  the  result  with  (21).  We  begin  by  transforming 
from  (a^,b^)  coordinates  to  (rk>  0^)  coordinates,  where 


and 

0k  = tan'1(bk/ak)  . 


The  volume  elements  in  the  two  coordinate  systems  are  related  by 
dakdbk  = rkdrkd0k>  Since  q(a,b)dadb  • q(r,  0 )djd0  * it  follows 
that 


and 


q(r,&)  - jT(rk/ir  Pk)exp(-rk/PR) 


q(r)  - J]  (2rk/Pk)exp(-rk/Pk)  (24) 

k 

hold,  where  in  (24)  we  have  integrated  over  the  ©k  coordinates.  Now  rk 

2 

and  xk  are  related  by  xk  = rk>  so  that  dxk  = 2rkdrk«  Since 
q(x)dx  » q(r)dr,  it  follows  from  (24)  that  q(x)  is  given  by 


q(x)  - exp(-xk/Pk)  , 


k- 1 


which  is  the  same  as  (21).  Thus,  when  given  information  in  the  form  of 
expected  spectral  powers,  it  doesn't  matter  whether  the  prior  density  is 
assumed  to  be  uniform  in  the  amplitude  variables  ak»  bk  or  in  the  power 
variables  x.  . The  result  is  a multivariate  exponential  in  the  variables 
xk  or  a multivariate  Gaussian  in  the  variables  ak,  bk» 

One  other  alternative  that  needs  consideration  is  to  work  in  the  (r,0  ) 
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coordinates  and  to  use  a prior  p(r,  ) that  is  uniform  with  respect  to  the 


volume  element  drd  . Integrating  over 
coordinates  leads  to  a non-uniform  prior 


and  transforming  to  the 


p(x)  = — TK> 1/2, 

2 H 


This  constrasts  with  the  choice  of  a uniform  prior  p(a,b),  which  corresponds 

to  s uniform  prior  p(x).  Since  there  is  no  reason  to  have  a non-uniform  prior 

p(x),  we  reject  the  possibility  of  a uniform  prior  p(r,9). 

C.  Spectral  Probability  Densities  for  White  Noise 

By  "white",  we  mean  that  the  expected  spectral  powers  ^x^  are  all 

equal.  What  probability  density  should  we  use  to  represent  white  noise? 

There  are  two  obvious  possibilities.  The  first  is  the  uniform  prior  p(x) 

itself,  for  which  x,  * x /2,  where  x is  the  maximum  value  of  the 

k max  max 

x^.  This  is  appealing  because  it  doesn't  require  any  additional  information 
beyond  the  prior.  Sometimes,  however,  we  may  know  the  total  power  per 
discrete  frequency  of  the  white  noise  signal,  in  which  case  (19)  would  seem  to 
be  the  appropriate  probability  density.  We  shall  refer  to  the  first  of  these 
two  possibilities  as  "uniform  white  noise"  and  to  the  second  as  "Gaussian 
white  noise." 

Under  some  circumstances  one  might  be  willing  to  argue  that,  although  we 
don't  know  the  total  power  per  discrete  frequency  of  the  noise  signal,  we  do 
know  an  upper  limit.  Stated  differently,  not  only  do  we  know  the  limit  *max 
but 

we  know  a limit  for  the  quantity 


P(q)  - -1-  - 1 (d*  ( ^kxk)q(£)  ' 
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name  1 y 


P(q) 


^ P 

max 


(25) 


As  mentioned  in  Section  II,  the  procedure  in  this  case  is  to  see  if  (25)  is 
satisfied  without  imposing  a specific  constraint,  and,  if  not,  to  impose  the 
equality  constraint  P = P The  inequality  constraint  (25)  will  be 

satisfied  by  the  uniform  prior  p(x)  if 


(x  /2)  ^ 

max 


max 


(26) 


If  (26)  does  not  hold,  it  follows  from  (19)  that  the  appropriate  density  is 
* 

q(x)  - jl  (1/P  ) exp(-x. /P  ) . (27) 

^ ~ 1 max  k max 

Jgv  i 

If  x reflects  knowledge  about  some  kind  of  physical  limit  while  P 

max  B max 

reflects  knowledge  about  power  limitations  of  the  signal  source,  then  it  seems 
likely  that  '26)  won't  be  satisfied,  which  means  that  the  Gaussian  white  noise 
density  (27)  should  be  used. 


i 


T 


IV.  CROSS-ENTROPY  BETWEEN  INPUT  AND  OUTPUT  OF  LINEAR  FILT*-.. 


Suppose  a signal  with  probability  density  p(x)  is  passed  through  a linear 
filter  with  characteristic  function  Y(*0.  Then  the  magnitude  of  each 
power  spectrum  component  is  changed  by  the  factor 


where  a>k  = £0^/2^  . If  q(x)  is  the  probability  density  of  the  signal 

that  results  from  passing  p(x)  through  the  filter,  then  the  input  p(x)  and  the 

output  q(x)  are  related  by 


,(x)  = P(xl/Sl’  X2/S2’  xn/Sn) 


S.S-. • «s 
l i n 


The  filter  has  the  effect  of  a linear  coordinate  transformation.  The 
cross-entropy  between  the  input  and  the  output  is 


H(q,p)  = \ d x q(x)log(q(x)/p(x)) 


A1  p(yi»---*yn>log(. — — 


‘ ,yn) 


p(yjSj,., 


»y  S ) 

■'ll  n 


Ik  log(S„)  , 


(28) 


where  yfc  = xk/Sk< 

Eq.  (28)  is  a general  result  for  any  input  signal  p(x).  Now  we  evaluate 
(28)  for  the  special  case  in  which  p(x)  is  uniform  and  for  the  special  case  in 
which  p(x)  is  exponential.  When  the  filter  input  p(x)  is  uniform,  the  first 
term  in  (28)  is  zero,  and  the  cross-entropy  between  input  and  output  is 


1 

i 

1 


1 
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H(q,p) 


- X,  log(Sk)  . 


Notice  that , except  for  sign,  this  is  the  discrete  form  of  the  expression  (1) 
When  the  filter  input  has  the  exponential  form  (21),  which  in  terms  of  the 
spectral  amplitudes  ak,  bk  is  Gaussian,  the  cross-entropy  (28)  becomes 


H(q,p) 


~{*z  p(y)  Z 


k (*k  ' *kV/Pk 


H(q,p) 


- 2"k  log(Sk), 

-Zk  (1  - Sk  ♦ log(Sfc))  , 


since  Jdy  yfep(y)  * Pfe.  Notice  that  this  result  is  independent  of  the 


particular  Pk  values. 


V.  MINIMUM  CROSS-ENTROPY  POWER  SPECTRA  GIVEN  AUTOCORRELATION  INFORMATION 


Let  some  unknown  signal  q^"(x)  have  a power  spectrum  G(f)  and 
autocorrelation  function  R(t).  Suppose  we  obtain  information  about  G in  the 
form  of  a set  of  samples  of  the  autocorrelation  function  R(t^), 


1 - R(t  ) “ \df 

r r J. 


G(  f )exp( 2iritf f ) , 


r =*  l,...,m.  We  do  not  assume  that  the  t are  equally  spaced.  If  the 
frequency  spectrum  is  discrete,  as  we  have  assumed  in  (15),  we  can  express 


G(f)  as 


c(f)  - Zg.  £(f  - f.  ) , 


where  ffc  - -f  fc,  Gfc  =■  G_k  - G(fk),  and  GQ  - 0.  Then  (31)  becomes 
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n 

\ = -2-Gk  exp(2tritrfk), 

l<*-n 

which  we  prefer  to  express  in  the  non-complex  form 


^ » G.  c . , 


r r~7  k rk 
U -1 


(32) 


where 


rk 


2 cos(  2 TT  t f ) . 

r k 


(33) 


Since  the  satisfy 


= |d* 


(34) 


we  can  rewrite  (32)  as 


*r  - jd*  < Vrk)<»t  <if) 


(35) 


This  has  the  form  of  known  expected  values  of  the  unknown  density  q Gc),  and 
we  may  therefore  use  the  principle  of  minimum  cross-entropy  to  infer  an 
estimate  of  q^.  In  terms  of  the  general  form  (6),  the  functions  are 
g^  = 2^kxkcrk'  This  minimum  cross-entropy  problem  differs  from  the 
one  discussed  in  Section  III  in  that  the  Section  III  problem  assumed  knowledge 
of  the  expected  spectral  powers  in  the  form  (34),  whereas  in  this  problem  we 
have  only  the  form  (35).  Since  typically  m^n,  knowledge  of  (35)  provides 
less  information  than  does  (34). 

A.  Results  When  No  Prior  Power  Spectrum  Estimate  is  Given 

If  we  have  no  prior  information  about  q^,  then  we  use  a uniform  prior  p(x) 
as  discussed  in  Section  III.  We  then  select  a posterior  q(x)  by  minimizing 
cross-entropy  subject  to  the  autocorrelation  constraints  (35)  and 


i 


r 
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the  normalization  constraint  (5).  The  result  is 

q(x)  - A exp(-2-  B»r  J Vrl)  ’ 
rs|  ' k*l 

where  the  are  m Lagrangian  multipliers  corresponding  to  the 

autocorrelation  constraints  (35).  For  convenience,  we  define 


u,  “ c . lb  c . , 

k r.|  | r rlc 


so  that  (36)  can  be  written  as 


,<-Is 


q(x)  * A ex 


Solving  for  A in  the  usual  way  yields 


TU 


For  our  posterior  estimate  Qk  of  the  power  spectrum,  we  use  the  density  (39) 
to  compute  = ^x^  * (1/u^),  or 

, . ! 

k T |W  ■ <«» 

where  the  multipliers  are  determined  by  the  constraints  (32), 

« r ~1 
R = *5  crk  I 

r 5"  ft -c  I • (41) 

If  . [ “j 

The  minimum  cross-entropy  power  spectrum  estimate  (40)  is  identical  to  the 
result  for  MESA,  except  that  the  MESA  equations  are  usually  expressed  in 
complex  form  ( / 10,  p.  9/,  for  example).  In  fact,  one  can  derive  (40)  using  a 
filtering  argument  of  the  kind  usually  used  in  deriving  the  MESA  result.  If  a 
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white  signal  p(x)  is  passed  through  a linear  filter  with  characteristic 

function  Y,  the  power  spectrum  of  the  output  signal  is  given  by 

Qk  = |y( fk) | 2 . If  the  output  signal  is  to  be  an  estimate  of  q^,  then  we 

know  that  the  must  satisfy  (32), 
o 

R = / Q,c.  . (42) 

r k rk 

k-i 

This  suggests  that  we  design  a filter  with  minimum  cross-entropy  between  input 
output,  provided  that  the  output  power  spectrum  satisfies  (42).  Since  one 
interpretation  of  cross-entropy  /40/  is  as  a measure  of  the  information 
necessary  to  transform  the  prior  into  the  posterior,  one  can  think  of  this 
filter  as  the  one  that  produces  the  smallest  information  change  from  the  prior 
while  still  accounting  for  all  known  information.  For  a uniform  white 
prior  p(.x),  the  cross-entropy  between  the  input  and  output  of  the  filter  is 
given  by  (29), 

H(q,p)  = - log(Qk)  (43) 

Hence,  we  minimize  (43)  subject  to  the  constraints  (42).  The  minimization 
condition  is 


-d/Qk)  ♦ 


r-M  ' 


and  its  solution  for  is  the  same  as  (40).  Furthermore,  minimizing  (43) 
subject  to  (42)  is  just  the  discrete  version  of  maximizing  (1)  subject  to  (2), 
which  also  shows  the  equivalence  between  MESA  and  minimum  cross-entropy 
estimation  for  uniform  priors.  This  equivalence  is  not  surprising,  since 
cross-entropy  minimization  is  equivalent  to  entropy  maximization  in  general 
for  the  case  of  uniform  priors  / 1 / . 
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B.  Results  When  A Prior  Power  Spectrum  Estimate  is  Given 

Now  we  consider  the  case  in  which  we  obtain  the  autocorrelation 

information  (35)  when  we  already  have  an  estimate  P^  of  the  power  spectrum 

(34).  In  this  case  the  prior  density  is  not  necessarily  uniform,  as  it 

must  reflect  the  information  in  the  prior  estimate  of  the  power  spectrum.  The 

appropriate  prior  density  is  the  exponential  form  (21), 
n 


p(x)  = [ , (1/Pk)  exp(-xv/Pk)  , 

l<si 


(44) 


which  itself  is  the  minimum  cross-entropy  density,  with  respect  to  a uniform 
prior,  given  knowledge  of  the  expected  spectral  powers  Pk . If  the  prior 
estimate  Pk  is  the  estimate  Qk  that  was  obtained  by  the  method  discussed 
in  the  previous  subsection,  then  the  appropriate  prior  density  for  obtaining  a 
new  estimate  given  new  autocorrelation  information  is  the  posterior  (39). 

Since  u^  = 1/Qk>  (39)  is  equivalent  to  (44). 

We  therefore  solve  the  problem  of  estimating  Gk , given  a prior  estimate 
P,  and  new  autocorrelation  information  (35),  by  assuming  the  prior  density 
(44)  and  minimizing  cross-entropy  subject  to  the  constraints  (35)  and 
(5).  The  result  is 


q(x)  = p(x)  exp(-  A - ^”kukxk) 

= e"*Vl  Texp("(Uk  + P°Xk)  ’ 

* K k 

where  the  u,  are  defined  as  in  (37).  Since  Vs  value  must  be  such  that 
k • 

q(x)  satisfies  the  normalization  constraint  (5),  (45)  becomes 


(45) 


q(x) 


(uk  + j~)  exp(-(uk  + jL)xk)  . 


(46) 


For  our  posterior  estimate  Qk  of  the  power  spectrum,  we  use  the  density  (46) 
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(47) 


to  compute  = lAu^  + or 

Qk  ’ “V  * rr(Sreck. 

where  as  usual  the  multipliers  |2> r are  determined  by  the  requirement  that 
the  satisfy  the  autocorrelation  constraints  (42). 

We  can  also  derive  the  result  (47)  by  a filtering  argument  that  is  similar 
to  the  one  given  in  the  previous  subsection  for  the  case  when  no  previous 
estimate  was  available.  Suppose  a signal  with  power  spectrum  P^  is  passed 
through  a linear  filter  with  characteristic  function  Y(f).  The  output  power 
spectrum  will  be  = P^S^,  where  = | Y ( f ^ ) J ^ . If  the  output 
power  spectrum  is  to  be  our  new  estimate,  we  know  that  the  must  satisfy 
(42).  If  the  is  our  previous  estimate,  this  suggests  that  we  design  a 
filter  with  minimum  cross-entropy  between  input  and  output,  given  that  the 
input  density  satisfies  = P^  and  that  the  output  power  spectrum 

satisfies  (42).  For  input  densities  of  the  exponential  form  (44),  the 
cross-entropy  between  input  and  output  is  given  by  (30).  Hence,  we  choose  the 
so  aa  to  minimize  (30)  subject  to  the  constraints  (42),  which  we  rewrite 


as 


» - I 


P,  S c , 

r ' — k k rk 

k-i 


The  minimization  condition  is 


1 - <1/Sk) 


• l>.  - 


k rk 


Solving  this  for  and  computing  = P^S^  yields  our  previous  result 
(47). 
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VI.  COMPARISON  OF  RESULTS 

Given  information  in  the  form  of  autocorrelation  samples  (35),  the  minimum 
cross-entropy  signal  probability  density  has  the  form 


q(x)  = "j  1 ( 1 / Qv ) exp(-xk/Qk) 


VA 


where  the  Qk  are  the  posterior  estimates  of  the  signal  power  spectrum.  In 
the  case  of  a uniform  white  prior,  the  Qk  are  given  by  (40), 


.(1) 


1) 


'rk 


(48) 


In  the  case  of  a Gaussian  white  prior,  the  Qk  are  given  by  (47)  with 

P.  = P for  all  k, 
k 

.(2)  . 1 


(1/P)  + Z 


(2) 


ft  c 

r |Jr  rk 


(49) 


As  discussed  in  Section  III,  P is  the  known  value  or  maximum  value  of  the 
expected  power  per  unit  frequency.  In  the  case  of  a Gaussian  non-white  prior, 
the  Qk  are  given  by  (47), 


.(3) 


1 


d/pk)  + z 


fl(  3) 

r Pr  Crk 


(50) 


where  the  Pk  are  prior  estimates  of  the  power  spectrum.  In  all  three  cases 
(48)-(50),  the  m Lagrangian  multipliers  are  determined  by  the 
requirement  that  the  satisfy  the  autocorrelation  constraints 


) n(i> 

Qk  rk 

k«l 


(51) 
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for  r = and  i = 1,2,3. 

We  begin  by  comparing  the  results  (48)  for  a uniform  white  prior  with 

those  (49)  for  a Gaussian  white  prior.  Suppose  that  one  of  the 

autocorrelation  samples,  say  , is  for  zero  lag  (t^  = 0).  Then  (33) 

shows  that  c,,  = 2 holds  for  all  k.  It  follows  that  (49)  can  be  written  as 
lk 


where  for  r = 2,...,m,  and  + (1/2P). 

Comparing  with  (48),  we  conclude  that  = for  all  k.  Thus, 

a uniform  white  prior  and  a Gaussian  white  prior  yield  the  same  posterior 
power  spectrum  estimates  when  one  of  the  autocorrelation  samples  is  for  zero 
lag.  This  is  reasonable  since  the  zero-lag  sample  is  just  the  total  expected 
power  per  discrete  frequency,  and  since  the  Gaussian  white  prior  is  the  result 
of  minimizing  cross-entropy  with  respect  to  the  uniform  white  prior  given 
knowledge  of  the  total  expected  power  per  unit  frequency  (See  Section  III). 

On  the  other  hand,  the  two  results  (48)-(49)  are  not  equivalent  in  general 


if  there  is  no  zero-lag  autocorrelation  sample.  To  see  this,  suppose  there  is 
only  one  autocorrelation  sample,  Rj  = y^~^0^c ^ . with  tj  f 0.  If 
Q^1 ^ were  to  hold  for  all  k = l,...,n,  then 


PS0',*  ■ -~1  * pl2,'.k  (5I> 

would  have  to  hold  for  all  k.  But  and  are  constants, 

whereas  the  Cjk  vary  with  k since  tj  + 0 (see  (33)).  It  follows  that  (51) 
and,  therefore,  cannot  hold  for  all  k. 

Now  we  compare  the  results  (49)  for  a Gaussian  white  prior  with  the 
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results  (50)  for  a Gaussian  non-white  prior.  We  consider  the  case  of  a single 


autocorrelation  sample  Rj  = tl,at  may  or  may  not  be  a 

(2)  (3) 

zero-lag  sample.  If  were  to  hold  for  all  k = l,...,n, 

then 


(52) 


would  have  to  hold  for  all  k.  Since  and  are  constants, 

and  since  varies  with  k independently  of  c^,  (52)  cannot  hold  for 

all  k whether  or  not  c^  is  a constant  (zero-lag  sample). 

The  results  of  the  foregoing  comparisons  may  be  summarized  as  follows: 

1)  The  results  for  a uniform  white  prior  and  a Gaussian  white  prior  will  be 
the  same  whenever  one  of  the  known  autocorrelation  samples  is  for  zero 
lag.  Since  (48)  is  just  the  MESA  result,  another  way  of  saying  this  is 
that  minimum  cross-entropy  spectral  analysis  and  MESA  are  equivalent  if 
there  is  no  prior  estimate  (other  than  uniform)  of  the  power  spectrum  and 
if  one  of  the  autocorrelation  samples  is  for  zero  lag. 

2)  If  there  is  no  zero-lag  autocorrelation  sample,  the  results  for  a uniform 

white  prior  and  a Gaussian  white  prior  will  not  in  general  be  the  same. 

3)  The  results  for  a Gaussian  non-white  prior  differ  in  general  from  those 

of  a Gaussian  white  prior  and  those  of  a uniform  white  prior,  whether  or 
not  one  of  the  autocorrelation  samples  is  for  zero  lag. 


VII.  TOWARDS  EFFICIENT  ALGORITHMS 


Minimum  cross-entropy  spectral  estimates  based  on  autocorrelation  samples 
are  given  by  ^48)-(50),  the  particular  form  depending  on  what  is  known  about 
the  signal  prior  to  obtaining  the  autocorrelation  samples.  In  all  three 
cases,  one  must  solve  for  the  Lagrangian  multipliers  ^ in  order  to  obtain 
actual  power  spectrum  values  Q.  . This  can  be  done  by  substituting  whichever 
of  (48)-(50)  is  applicable  into  (51)  and  solving  the  m resulting  equations  for 
the  |?>r.  Unfortunately,  this  approach  is  unlikely  to  be  suitable  for 
real-time  signal  processing  although  it  has  the  advantage  of  considerable 
generality.  In  particular,  the  results  (48)-(50)  and  (51)  make  no  assumptions 
about  the  number  and  spacing  of  either  frequencies  or  autocorrelation 
samples . 

In  many  real-time  signal  processing  applications,  it  is  reasonable  to 

assume  that  the  the  frequencies  f^  are  equally  spaced  f^  = k Af,  and  that 

the  autocorrelation  samples  R(tr)  are  equally  spaced  at  the  Nyquist  rate, 

t = l/(2nAf).  In  this  case,  the  coefficients  c , become 
r rk 

c^k  = 2cos(Wrk/n)  . (53) 

This  form  has  a variety  of  properties  that  can  lead  to  algorithmic 
simplifications,  the  fast  fourier  transform  being  the  best  known  example.  In 
the  MESA  case  (48),  the  simplified  form  (53)  leads  to  efficient  algorithms 
that  are  equivalent  to  autoregressive  and  linear  prediction  methods  / 10/ , / 1 9/ . 
It  seems  reasonable  to  expect  that  similar  efficient  techniques  can  also  be 
derived  from  (49)  and  (50). 

Even  in  cases  where  there  the  form  (53)  does  not  apply,  there  is  one 


situation  in  which  cross-entropy  spectral  analysis  given  a prior  spectral 
estimate  and  new  autocorrelation  samples  may  be  particularly  efficient. 


Suppose  that  the  prior  spectral  estimates  are  completely  consistent  with 

the  new  autocorrelation  samples,  i.e.,  that  R = P,  c , holds 

r k k rk 

(compare  with  (42)  or  (51)).  In  this  case  the  multipliers  |Sr  will  all 
satisfy  ft  = 0,  and  the  posterior  spectral  estimates  will  not  change  from 
the  prior  estimates,  Qk  = P^.  This  follows  from  a general  property  of 
cross-entropy  that  H(p,p)  H(q,p)  for  all  q p (e.g.,  /29,  p.14/),  which 
means  that  the  minimum  cross-entropy  posterior  will  equal  the  prior  whenever 


the  prior  meets  all  constraints.  Now  suppose  that  the  prior  estimates  P^ 
were  obtained  by  cross-entropy  minimization  given  a prior  set  of 
autocorrelation  samples  R\  The  foregoing  suggests  that  if  the  new 

samples  R^  are  close  in  value  to  the  old  samples  R^  i.e.,  the 

spectrum  is  slowly  changing  then  the  multipliers  will  be  close  to 

zero.  As  mentioned  above,  the  multipliers  are  determined  in  general  by 
substituting  (47)  into  (42)  and  solving  the  resulting  equations.  The  result 


of  the  substitution  is 


* 

Rr  ’ HVrk'1  * P*  ' 

k-1  J’lf 

If  the  (ij  are  close  to  zero,  we  may  expand  this  ignoring  terms  of  0( ?) 


and  higher,  yielding 


R = 2.  (P.c  . - P?c  . Z - ft.c..  ) . 

r k k rk  k rk  *“j  j jk 

However,  the  prior  spectral  estimates  P^  and  the  prior  autocorrelation 
samples  R’  are  related  by  R’  m *kcrk.  It  follows  that  we 
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may  write  (55)  in  the  form 


(56) 


where  we  have  defined  d . 

rj 


d . 
rj 


*> 

2. 

kM 


_2 

P.  c . c .. 
k rk  jk 


as 


(57) 


Thus,  in  the  case  of  slowly  a changing  autocorrelation  function,  (54)  reduces 
to  a set  of  linear  equations  (56),  whether  or  not  (53)  holds.  If  (53)  also 
holds,  then  the  computation  of  (57)  should  be  simpler,  thereby  making  the 
solution  of  (56)  even  more  efficient. 


VIII.  CONCLUSIONS 


Depending  on  the  extent  to  which  a previous  estimate  is  available, 
cross-entropy  minimization  provides  various  alternatives  for  estimating  power 
spectra  given  autocorrelation  samples.  The  results  reduce  to  those  of  maximum 
entropy  spectral  analysis  (MESA)  in  a manner  that,  in  my  opinion,  provides  a 
better  derivation  of  MESA  than  previous  approaches.  When  a previous  power 
spectrum  estimate  P.  is  available,  cross-entropy  minimization  leads  to  a new 
estimate  that  differs  from  the  MESA  result.  Because  it  exploits  more 
information,  the  minimum  cross-entropy  estimate  should  be  better  than  the  MESA 
estimate.  This  possibility,  and  the  question  of  efficient  algorithms,  will  be 
explored  in  further  work. 
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